Abstract

This is a study of gerrymandering in Alabama. We will test three methods of shape-based compactness scores, assess representativeness of districts based on prior presidential elections and race. We will then extend prior studies by calculating representativeness of the convex hull of district polygons.

Study Metadata

  • Key words: Alabama, gerrymandering, compactness, convex hull, political representation
  • Subject: Social and Behavioral Sciences: Geography: Geographic Information Sciences
  • Date created: 2025-02-17
  • Date modified: 2025-02-17
  • Spatial Coverage: Alabama OSM:161950
  • Spatial Resolution: Census Block Groups
  • Spatial Reference System: EPSG:4269 NAD 1983 Geographic Coordinate System
  • Temporal Coverage: 2020-2024 population and voting data
  • Temporal Resolution: Decennial census

Study design

This is an original study based on literature on gerrymandering metrics.

It is an exploratory study to evaluate usefulness of a new gerrymandering metric based on the convex hull of a congressional district and the representativeness inside the convex hull compared to the congressional district.

Materials and procedure

Computational environment

I plan on using package … for …

Data and variables

WE plan on using data sources …. , … ….

Precincts 2020

  • Title: Voting Precincts 2020
  • Abstract: Alabama voting data for 2020 elections by precinct.
  • Spatial Coverage: Alabama
  • Spatial Resolution: Voting precincts
  • Spatial Reference System: EPSG 4269 NAD 1983 geographic coordinate system
  • Temporal Coverage: precincts used for tabulating the 2020 census
  • Temporal Resolution: annual election
  • Lineage: Saved a sgeopackage format. Processing prior to download is explained in al_vest_20_validation_report.pdf and readme_al_vest_20.txt
  • Distribution: Data available at Redistricting Data Hub with free login
  • Constraints: Permitted for noncommercial and nonpartisan use only. Copright and use constrains explained in redistrictingdatahub_legal.txt
  • Data Quality: State any planned quality assessment
  • Variables: For each variable, enter the following information. If you have two or more variables per data source, you may want to present this information in table form (shown below)
    • Label: variable name as used in the data or code
    • Alias: intuitive natural language name
    • Definition: Short description or definition of the variable. Include measurement units in description.
    • Type: data type, e.g. character string, integer, real
    • Accuracy: e.g. uncertainty of measurements
    • Domain: Expected range of Maximum and Minimum of numerical data, or codes or categories of nominal data, or reference to a standard codebook
    • Missing Data Value(s): Values used to represent missing data and frequency of missing data observations
    • Missing Data Frequency: Frequency of missing data observations: not yet known for data to be collected
Label Alias Definition Type Accuracy Domain Missing Data Value(s) Missing Data Frequency
VTDST20 Voting district ID
GEOID20 Unique geographic ID
G20PRETRU total votes for Trump in 2020
G20PREBID total votes for Biden in 2020

Decennial Census

We acquire decennial census data in block groups using the tidycensus package. First, query metadata for the pl public law data series.

The issue in the 2023 court cases on Alabama’s gerrymandering was a racial gerrymander discriminating against people identifying as Black or African American. Therefore, we will analyze people of voting age (18 or older) identifying as Black and or African as one race in any combination with other races. This data is found in table P3.

Query the public law data series table P3 on “race for the population 18 years and over”.

## Reading layer `block_groups' from data source 
##   `C:\git\josephholler\OR-Gerrymander-Alabama\data\raw\public\block_groups.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 3925 features and 83 fields (with 1 geometry empty)
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.47323 ymin: 30.22333 xmax: -84.88908 ymax: 35.00803
## Geodetic CRS:  NAD83

Prior observations

We have previously investigated the compactness scores of Alabama’s congressional districts as well as the percentage of Biden voters from the 2020 elections and the percentage of the population 18 years or older that is not Hispanic and is Black or African American.

We have never calculated the minimum bounding circle or convex hulls of Alabama’s congressional districts.

Bias and threats to validity

This study is explicitly an investigation to the modifiable areal unit problem. Aspects of the study are extremely sensitive to the combination of edge effects and scale, whereby complex borders formed by natural features, e.g. coastlines or rivers, vary greatly in perimeter depending on the scale of analysis. We hope that in part, this study establishes a method that is more robust (less sensitive) to the threats to validity caused by scale and edge effects in studies of gerrymandering and district shapes.

Data transformations

Describe all data transformations planned to prepare data sources for analysis. This section should explain with the fullest detail possible how to transform data from the raw state at the time of acquisition or observation, to the pre-processed derived state ready for the main analysis. Including steps to check and mitigate sources of bias and threats to validity. The method may anticipate contingencies, e.g. tests for normality and alternative decisions to make based on the results of the test. More specifically, all the geographic and variable transformations required to prepare input data as described in the data and variables section above to match the study’s spatio-temporal characteristics as described in the study metadata and study design sections. Visual workflow diagrams may help communicate the methodology in this section.

Examples of geographic transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc.

Examples of variable transformations include standardization, normalization, constructed variables, imputation, classification, etc.

Be sure to include any steps planned to exclude observations with missing or outlier data, to group observations by attribute or geographic criteria, or to impute missing data or apply spatial or temporal interpolation.

Calculate Census Variables

Sum the total of people identifying as Black or African American as one race or any combination of multiple races. First, make a list of all the variables inclusive of people identifying as Black or African American.

X name label
151 P3_004N !!Total:!!Population of one race:!!Black or African American alone
158 P3_011N !!Total:!!Population of two or more races:!!Population of two races:!!White; Black or African American
163 P3_016N !!Total:!!Population of two or more races:!!Population of two races:!!Black or African American; American Indian and Alaska Native
164 P3_017N !!Total:!!Population of two or more races:!!Population of two races:!!Black or African American; Asian
165 P3_018N !!Total:!!Population of two or more races:!!Population of two races:!!Black or African American; Native Hawaiian and Other Pacific Islander
166 P3_019N !!Total:!!Population of two or more races:!!Population of two races:!!Black or African American; Some Other Race
174 P3_027N !!Total:!!Population of two or more races:!!Population of three races:!!White; Black or African American; American Indian and Alaska Native
175 P3_028N !!Total:!!Population of two or more races:!!Population of three races:!!White; Black or African American; Asian
176 P3_029N !!Total:!!Population of two or more races:!!Population of three races:!!White; Black or African American; Native Hawaiian and Other Pacific Islander
177 P3_030N !!Total:!!Population of two or more races:!!Population of three races:!!White; Black or African American; Some Other Race
184 P3_037N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; American Indian and Alaska Native; Asian
185 P3_038N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; American Indian and Alaska Native; Native Hawaiian and Other Pacific Islander
186 P3_039N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; American Indian and Alaska Native; Some Other Race
187 P3_040N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; Asian; Native Hawaiian and Other Pacific Islander
188 P3_041N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; Asian; Some Other Race
189 P3_042N !!Total:!!Population of two or more races:!!Population of three races:!!Black or African American; Native Hawaiian and Other Pacific Islander; Some Other Race
195 P3_048N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; American Indian and Alaska Native; Asian
196 P3_049N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; American Indian and Alaska Native; Native Hawaiian and Other Pacific Islander
197 P3_050N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; American Indian and Alaska Native; Some Other Race
198 P3_051N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; Asian; Native Hawaiian and Other Pacific Islander
199 P3_052N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; Asian; Some Other Race
200 P3_053N !!Total:!!Population of two or more races:!!Population of four races:!!White; Black or African American; Native Hawaiian and Other Pacific Islander; Some Other Race
205 P3_058N !!Total:!!Population of two or more races:!!Population of four races:!!Black or African American; American Indian and Alaska Native; Asian; Native Hawaiian and Other Pacific Islander
206 P3_059N !!Total:!!Population of two or more races:!!Population of four races:!!Black or African American; American Indian and Alaska Native; Asian; Some Other Race
207 P3_060N !!Total:!!Population of two or more races:!!Population of four races:!!Black or African American; American Indian and Alaska Native; Native Hawaiian and Other Pacific Islander; Some Other Race
208 P3_061N !!Total:!!Population of two or more races:!!Population of four races:!!Black or African American; Asian; Native Hawaiian and Other Pacific Islander; Some Other Race
211 P3_064N !!Total:!!Population of two or more races:!!Population of five races:!!White; Black or African American; American Indian and Alaska Native; Asian; Native Hawaiian and Other Pacific Islander
212 P3_065N !!Total:!!Population of two or more races:!!Population of five races:!!White; Black or African American; American Indian and Alaska Native; Asian; Some Other Race
213 P3_066N !!Total:!!Population of two or more races:!!Population of five races:!!White; Black or African American; American Indian and Alaska Native; Native Hawaiian and Other Pacific Islander; Some Other Race
214 P3_067N !!Total:!!Population of two or more races:!!Population of five races:!!White; Black or African American; Asian; Native Hawaiian and Other Pacific Islander; Some Other Race
216 P3_069N !!Total:!!Population of two or more races:!!Population of five races:!!Black or African American; American Indian and Alaska Native; Asian; Native Hawaiian and Other Pacific Islander; Some Other Race
218 P3_071N !!Total:!!Population of two or more races:!!Population of six races:!!White; Black or African American; American Indian and Alaska Native; Asian; Native Hawaiian and Other Pacific Islander; Some Other Race

Next, calculate new columns. Black is a sum of all 32 columns shown above, in which any of the racial categories by which someone identifies is Black or African American.
Total is a copy of the population 18 years or over, variable P3_001N.
PctBlack is calculated as Black / Total * 100
CheckPct is calculated as the percentage of the population 18 years or older that is either white of one race only (P3_003N) or Black or African American as calculated above. In Alabama, we can expect that this will be close to 100% for most block groups, and should never exceed 100%.

Save the results as blockgroups_calc.gpkg

Map the percentage of the population 18 or over that is Black or African American.

Load districts

What layers are stored in districts.gpkg?

## Driver: GPKG 
## Available layers:
##    layer_name geometry_type features fields crs_name
## 1 districts21 Multi Polygon        7      4   WGS 84
## 2 districts23 Multi Polygon        7      4    NAD83
## 3 precincts20 Multi Polygon     1972      8    NAD83

Load the disticts21 layer

## Reading layer `districts21' from data source 
##   `C:\git\josephholler\OR-Gerrymander-Alabama\data\raw\public\districts.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 7 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -88.47323 ymin: 30.14443 xmax: -84.88825 ymax: 35.00803
## Geodetic CRS:  WGS 84

Notice that the coordinate reference system was WGS 1984, but should be NAD 1983 to match the study CRS. Transform the CRS. Also, calculate the percentage of the population that is Black

Map the districts over the black population

## ℹ tmap mode set to "view".
## Registered S3 method overwritten by 'jsonify':
##   method     from    
##   print.json jsonlite

Estimate the white and black voting age populations using AWR with block groups. Why do this when POPULATION, BLACK, and WHITE variables are already in the table? First, this is the total population, but we should care more about the voting age population. Second, we may want to categorize and calculate BLACK differently from the state of Alabama.

It turns out that R optimizes the first dataset in a spatial query or overlay, with a spatial index, and not the second. Therefore, add the more complex data to st_intersection first, and you’ll see remarkably different run times.

Spatial indices in R: https://r-spatial.org/r/2017/06/22/spatial-index.html

Report results. We find very similar percentages of Black or African American people.

DISTRICT POPULATION WHITE BLACK pctBlack bgTotal bgBlack pctBlackbg
1 717754 461324 186921 26.0 557342.4 142843.24 25.6
2 717755 433244 217392 30.3 558173.9 168697.51 30.2
3 717754 479432 176953 24.7 564208.0 141086.63 25.0
4 717754 582698 51929 7.2 556586.0 42949.26 7.7
5 717754 499707 124642 17.4 561381.6 101418.26 18.1
6 717754 498843 138019 19.2 551752.6 104977.72 19.0
7 717754 265204 400306 55.8 567451.7 312326.83 55.0

Join convex hull estimates to Districts with blockgroup estimates.

Calculate compactness scores based on:

  • the area and perimeter
  • the area and the area of the convex hull
  • the area and the area of the minimum bounding circle

This block takes some time to run due to the st_minimum_bounding_circle function.

Note: To knit, will we need to replace st_perimeter() with st_length(st_cast(geom, "MULTILINESTRING"))?

Plot representational difference against compactness

Scatterplot with (absolute) difference in representation on x axis and compactness on y axis. Plot the three different compactness scores simultaneously with different colors. Symbolize the districts with different shapes.

## `geom_smooth()` using formula = 'y ~ x'

There is a negative relationship between convex hull compactness and convex hull difference. There is a negative relationship between convex hull compactness and convex hull difference. There is a negative relationship between minimum bounding circle compactness and convex hull representational difference.
The exceptions are districts 5 and 7. District 7 really is gerrymandered (packed African American), but the minimum bounding circle method does not find it so. District 5 is not really gerrymandered, even though the minimum bounding circle does find it so.

Shape and convex hull exhibit a positive correlation.
Shape and minimum bounding circle exhibit a positive correlation, with the exception of District 5. Convex hull and minimum bounding circle exhibit a positive correlation, with the exception of District 5.

District 5 is a long, but otherwise compact shape.

Analysis

Describe the methods of analysis that will directly test the hypotheses or provide results to answer the research questions. This section should explicitly define any spatial / statistical models and their parameters, including grouping criteria, weighting criteria, and significance thresholds. Also explain any follow-up analyses or validations.

Results

Describe how results are to be presented.

Discussion

Describe how the results are to be interpreted vis a vis each hypothesis or research question.

Integrity Statement

Include an integrity statement - The authors of this preregistration state that they completed this preregistration to the best of their knowledge and that no other preregistration exists pertaining to the same hypotheses and research. If a prior registration does exist, explain the rationale for revising the registration here.

Acknowledgements

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI: 10.17605/OSF.IO/W29MQ

References

Cheng, Joe, Carson Sievert, Barret Schloerke, Winston Chang, Yihui Xie, and Jeff Allen. 2024. Htmltools: Tools for HTML. https://github.com/rstudio/htmltools.
Müller, Kirill. 2020. Here: A Simpler Way to Find Your Files. https://here.r-lib.org/.
Pebesma, Edzer. 2018. Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.
———. 2024a. Lwgeom: Bindings to Selected Liblwgeom Functions for Simple Features. https://r-spatial.github.io/lwgeom/.
———. 2024b. Sf: Simple Features for r. https://r-spatial.github.io/sf/.
Pebesma, Edzer, and Roger Bivand. 2023. Spatial Data Science: With applications in R. Chapman and Hall/CRC. https://doi.org/10.1201/9780429459016.
R Core Team. 2024. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Tennekes, Martijn. 2018. tmap: Thematic Maps in R.” Journal of Statistical Software 84 (6): 1–39. https://doi.org/10.18637/jss.v084.i06.
———. 2025. Tmap: Thematic Maps. https://github.com/r-tmap/tmap.
Walker, Kyle, and Matt Herman. 2025. Tidycensus: Load US Census Boundary and Attribute Data as Tidyverse and Sf-Ready Data Frames. https://walker-data.com/tidycensus/.
Wickham, Hadley. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://tidyverse.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Xie, Yihui. 2014. “Knitr: A Comprehensive Tool for Reproducible Research in R.” In Implementing Reproducible Computational Research, edited by Victoria Stodden, Friedrich Leisch, and Roger D. Peng. Chapman; Hall/CRC.
———. 2015. Dynamic Documents with R and Knitr. 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC. https://yihui.org/knitr/.
———. 2024. Knitr: A General-Purpose Package for Dynamic Report Generation in r. https://yihui.org/knitr/.
Xie, Yihui, JJ Allaire, and Jeffrey Horner. 2024. Markdown: Render Markdown with Commonmark. https://github.com/rstudio/markdown.